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It has previously been shown that by adding a pseudo-random "dither" 
noise to a signal to be quantized, and by subtracting an identical noise 
sequence jrom the quantizer oidput, it is possible to break up undesirable 
signal-dependent patterns in the quantization error sequence without 
increasing the variance oj the error. The effect oj the dither noise becomes 
significant when the number oj bits per sample is less than about six. An 
experimental evaluation oj the perceptual effects oj dither on speech has 
shown: 

(i) strong preferences jor dithered speech over straight PCM encoding 

at identical bit rates, 
(ii) jor low bit rates (2-4 bits/ sample), a preference jor dithered speech 

over PCM encoded speech even when the PCM speech had one more 

bit per sample than the dithered speech, 
(Hi) an increase in ivord intelligibility for dithered speech over PCM 

speech -when 4 to G bits/sample were used, 
(iv) a decrease in word intelligibility jor dithered speech over PCM 

speech when 2 to 3 bits/ sample were used. 

I. INTRODUCTION 

When a signal, such as a speech waveform, is quantized, the quantiza- 
tion error waveform is usually correlated with the original signal. This 
correlation is virtually inperceptible when the quantization is quite 
fine-i.e., a large number of bits/sample. For crude quantizations, 
however, the correlation becomes quite large and the quantization error 
is easily perceived. As a result, it can become quite disturbing to listen 
to speech quantized to a low number of bits/sample for an extended 
period of time. In such cases, techniques that decorrelate the quantiza- 
tion error from the signal are attractive, even if they do not increase 
the signal-to-noise ratio of the system. Dithering is such a technique in 
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which a pseudo-random "dither" noi.se is added to the speech before 
quantizing, and then the identical noise is subtracted producing a 
quantization error which is uncorrected with the original speech wave- 
form. 1 Figure 1 shows a comparison between straight PCM and a 
system in which dithering is used. In an earlier work, Jayant and 
Rabiner 2 discussed several theoretical issues involved with dithering 
and demonstrated its utility for the quantization of speech signals. In 
this paper, we present experimental results on the perceptual effects of 
dither on both the preference and intelligibility of PCM encoded speech. 



II. PREFERENCE EVALUATION TEST 

The purpose of this experiment was to determine the perceptibility 
of the decrease in correlation between the quantization error and the 
original speech, as a function of the number of signal bits. 

The stimuli used in the preference test were a set of ten sentences 
chosen from a list of "everyday speech" sentences 3 compiled at the 
Central Institute for the Deaf. The sentences used are shown in Table I. 
These ten sentences were spoken by a General American speaker, 
digitized at a 10 kHz rate with 16 bits/sample, and stored on the disc 
of the DDP-516 computer. 

In order to limit the number of stimuli to be used in the paired- 
comparisons preference test, the number of bits/sample was restricted 
to the range of 2 to 6 bits. Therefore, there were ten distinct stimuli in 
the test, i.e., (five possible values for the number of bits) X (two types 
of quantization-dither or straight PCM). For notational convenience, 
the stimuli were coded using a two-digit code. The first digit refers to 
the number of bits/sample (i.e., 2-6) and the second digit specifies the 
type of quantization. A in the second digit means straight PCM 
encoding, whereas a 1 in the second digit means dithered speech. Thus 
stimulus 31 has 3 bits/sample and uses the dither noise, whereas stimulus 
50 has 5 bits/sample and does not use dither noise. 

Since there were ten distinct conditions to be evaluated, a complete 
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Fig. 1 — Block diagrams of a straight PCM system and a dither system. 
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Table I — Sentences Used in Preference Test 

1. Walking's my favorite exercise. 

2. Here's a nice quiet place to rest. 

'•i. Our janitor sweeps the floor every night. 

4. It would be much easier if everyone would help. 

5. Good morning. 

6. Open your windows before you go to bed. 

7. Do you think she should stay out so late. 

8. How do you feel about changing the time when we begin work. 

9. Here we go! 

10. Move out of the way. 



paired-comparison preference test involved 100 pairs. These 100 pairs 
were randomly generated by a DDP-516 program which randomly 
accessed each of the ten stimulus sentences ten times in the course of 
the experiment. Each of the 100 stimulus pairs was recorded on magnetic 
tape for offline running of the experiment. 

Ten subjects participated in the experiment. Each subject was given 
the following instructions: 

"In this test you will be listening to pairs of sentences. Each of the 
two sentences (first is called A, second B) was processed by some 
type of speech transmission system. After you hear both sentences, 
there is a five-second interval in which you are to write down the 
sentence, A or B, you prefer, i.e., the type of transmission system 
you would prefer listening to for an extended period of time. You 
must choose either A or B-even if you have no preference." 

The preference test required two 15-minute listening sessions per subject 
and was run on two separate days. 

III. RESULTS OF PREFERENCE TEST 

For each of the ten subjects, a matrix of preferences was determined 
in which a 1 in a particular cell of the matrix denoted that stimulus B 
is preferred to stimulus A, and a indicated the reverse condition. 
Table II shows the matrix obtained by summing the matrices for the 
ten subjects. Careful inspection of this matrix shows a strong preference 
for dithered speech over straight PCM encoding at a fixed number of 
bits/sample, and, in many cases, a preference for dithered speech at 
L bits/sample (L = 2-4) over straight PCM encoded speech at (L + 1) 
bits/sample. 

To verify these preference results, the data was analyzed using a 
multidimensional preference program of Carroll. 4 The program indicated 
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Table II — Matrix of Sum of Preferences for Paired 
Compression Preference Test 
Stimulus B 
20 21 30 31 40 41 50 51 60 61 
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that the preferences were essentially one-dimensional (over 95 percent 
of the variance was accounted for by one dimension), and produced a 
graphical interpretation of the overall preferences which is shown in 
Fig. 2. Since the preference judgments were one-dimensional, all the 
conditions lie on a line. The direction of preference goes from left to 
right in terms of decreasing preference. Figure 2 clearly shows: 

(i) For a fixed number of bits/sample the dithered speech samples 

are always preferred to straight PCM encoding, 
(ii) For 2-4 bits/sample, dithered speech is preferred to straight 
PCM encodings even with one extra bit/sample, i.e., condition 41 
is preferred to condition 50, condition 31 is preferred to condi- 
tion 40, and condition 21 is preferred to condition 30. 

Thus in some perceptual sense, dithered PCM speech has a one-bit 
advantage over straight PCM encoding under certain conditions. This, 
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Fig. 2— Ordering of the stimuli in terms of preference. 
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of course, is not correct in terms of physical measures such as signal-to- 
noise ratio, or, as we will see, word intelligibility. 

A complete analysis of variance was performed on the preference data 
and the results of this analysis are shown in Table III. The three factors 
and the number of levels of each are: 

(i) number of bits/sample (5) 

(it) type of quantization (2) 

(in) subjects (10) 

The analysis reconfirms the conclusions already discussed in that the 
most significant effects (significance » 0.999 level) were number of 
bits/sample, and type of quantization. Subjects were significant at the 
0.95 level, and the interaction between bits and dither was also signi- 
ficant at this level. 

IV. WORD INTELLIGIBILITY TEST 

The purpose of the intelligibility test was to determine the effects 
of dithering on the intelligibility of isolated monosyllables. As discussed 
earlier, the effect of dither is to make the quantization noise act like 
an additive wideband uncorrelated noise. Earlier studies 5 have indicated 
that such a noise tends to mask consonants, thereby lowering intel- 
ligibility. The effect of the correlated quantization noise on straight 
PCM encoding on word intelligibility was also measured. 

In this experiment, 200 PB words 6 (Lists 2, 4, 5 and 6 in Ref. 5) were 
recorded, digitized, and stored on the disc of the DDP-516. The words 
were accessed at random, in groups of 50 (i.e., an entire list was processed 
before a new list was used), by one of the ten systems used in the 



Table III — Analysis of Variance of Preference Data 
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* N.S. => not significant above 0.90 level. 
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preference test. The 200 words were divided into two tests of 100 words, 
each test containing 10 versions of each stimulus condition. The same 
ten subjects were used in the intelligibility test as in the preference test. 
The two tests were given on separate days to all ten subjects. 



V. RESULTS OF INTELLIGIBILITY TEST 

Table IV shows the average error scores as a function of the number 
of bits/sample, and the type of quantization. (The notation of the 
previous section is used again here.) These data are averaged over 
subjects and tests. This table shows that at 2 bits/sample, the PCM 
system has an error rate of 59.5 percent as opposed to 76 percent for 
the dither system, i.e., a decrease of 16.5 percent in word intelligibility 
due to consonant masking. At 3 bits/sample, the PCM system has an 
error rate of 34.5 percent whereas the dither system has an error rate 
of 46.5 percent. Thus even at 3 bits/sample, the masking of the dither 
noise reduces word intelligibility by about 12 percent. At 4-6 bits/sample, 
the dither system has lower error rates than the PCM system- the 
differences being 10 percent at 6 bits/sample, 1.5 percent at 5 bits/sample 
and 0.5 percent at 4 bits/sample. Thus only at 6 bits/sample is the 
error rate difference significant. The data of Table IV are plotted in 
Fig. 3 to show how the error rate varies with the number of bits/sample 
for the two systems. 

A complete analysis of variance was performed on the raw data of 
the intelligibility test, The four factors used in the analysis (and the 
number of levels of each factor) were 

(i) number of bits/sample (5) 

(ii) type of quantization (2) 

(Hi) subjects (10) 

(iv) repetitions (2) 

Table IV— Word Error Scores Averaged Over 
Subjects and Repetitions 



Number of Bits 
per Sample 


Error Rate 


PCM 


Dither 


Difference 


2 

3 
4 
5 
6 


59.5% 

34.5% 

29.5% 

25% 

16.5% 


76% 
46.5% 
29% 
23.5% 
6.5% 


-16.5% 

-12% 
0.5% 
1.5% 
10% 
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Fig. 3 — The percentage error for word intelligibility as a function of the number of 
bits/sample for straight PCM and dither systems. 

The results of the analysis are shown in Table V. The most significant 
factor was, of course, the number of bits/sample. The next most signifi- 
cant factors were subjects, repetitions, bits/sample X type of quantiza- 
tion, and bits/sample X repetitions. These results indicate a fairly 
large amount of learning between repetitions 1 and 2, as well as a lack 
of consistency between the intelligibility scores of the different subjects. 

VI. CONCLUSIONS 

The results of the preference test were quite encouraging in that 
subjects uniformly showed strong preferences for dithered speech over 
straight PCM encoding at all bit rates employed in the experiment. 
At the lower bit rates, the preference for dithered speech over higher 
bit rate PCM encoded speech presents strong evidence for the per- 
ceptibility and annoyance of highly correlated quantization noise. 

The word intelligibility tests showed that the wideband uncorrelated 
dither noise tended to mask the consonants more than the correlated 
PCM noise thereby reducing word intelligibility by about 14 percent 
at low bit rates. At the higher bit rates used in the experiment, there 
was no decrease in word intelligibility for the dither system, and, in 
fact, at 6 bits/sample, the dithered words were 10 percent more intel- 
ligible than the straight PCM encoded words. Since the average per- 
centage correct for the PCM system was 83.5 percent, an increase of 
10 percent is a significant increase in intelligibility. 

Overall, these experiments indicate that the use of dither noise in 
the range of 4-6 bits per sample has many beneficial effects. 
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Table V — Analysis of Variance of Intelligibility Data 
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N.S. => not significant above 0.90 level. 
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